oo 

On 



Hypertree Decompositions and Tractable Queries 



o 
Q 

00 
(N 

PQ 

q 

o 



Georg Gottlob, Nicola Leone, Francesco Scarcello 

Institut fiir Informationssysteme 
Technische Universitat Wien 
Paniglgasse 16, 1040 Wien 

E-mail: {gottlob,leone,scarcell}@dbai.tuwien.ac.at 



> 
(N 
(N 
O 
(N 

oo 

ON 

o 



November 1998 



DBAI-TR 98/21 



X 



HYPERTREE DECOMPOSITIONS AND TRACTABLE QUERIES 



Georg Gottlob Nicola Leone Francesco Scarcello * 

Institut fur Informationssysteme, 
Technische Universitat Wien 
A- 1040 Wien, Paniglgasse 16, Austria 

{gottlob, leone, scarcell}@dbai. tuwien . ac . at 

DBAI-TR 98/21, November 1998 



Abstract 

Several important decision problems on conjunctive queries (CQs) are NP-complete in general but 
become tractable, and actually highly parallelizable, if restricted to acyclic or nearly acyclic queries. Ex- 
amples are the evaluation of Boolean CQs and query containment. These problems were shown tractable 
for conjunctive queries of bounded treewidth [Q], and of bounded degree of cyclicity [ |l8[ [i"7| ], The so 
far most general concept of nearly acyclic queries was the notion of queries of bounded query-width in- 
troduced by Chekuri and Rajaraman [^|. While CQs of bounded query width are tractable, it remained 
unclear whether such queries are efficiently recognizable. Chekuri and Rajaraman stated as an open 
problem whether for each constant k it can be determined in polynomial time if a query has query width 
< k. We give a negative answer by proving this problem NP-complete (specifically, for k = 4). In order 
to circumvent this difficulty, we introduce the new concept of hypertree decomposition of a query and the 
corresponding notion of hypertree width. We prove: (a) for each k, the class of queries with query width 
bounded by k is properly contained in the class of queries whose hypertree width is bounded by k; (b) 
unlike query width, constant hypertree-width is efficiently recognizable; (c) Boolean queries of constant 
hypertree width can be efficiently evaluated. 



*Partially supported by the Istituto per la Sistemistica e I ' lnformatica of the Italian National Research Council (ISI-CNR), under 
grant n.224.07.5 
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1 Introduction and Overview of Results 



1.1 Conjunctive Queries 

One of the simplest but also one of the most important classes of database queries is the class of conjunctive 
queries (CQs). In this paper we adopt the logical representation of a relational database [|^, [I]], where data 
tuples are identified with logical ground atoms, and conjunctive queries are represented as datalog rules. We 
will, in the first place, deal with Boolean conjunctive queries (BCQs) represented by rules whose heads are 
variable-free, i.e., propositional (see Example |1 . 1| below). From our results on Boolean queries, we are able 
to derive complexity results on important database problems concerning general (not necessarily Boolean) 
conjunctive queries. 

Example 1.1 Consider a relational database with the following relation schemas: 

enrolled (Perst, Course!, Reg_Date) 
teaches (Perst, Courset, Assigned) 
parent (Persl, Pers2) 

The BCQ Qi below checks whether some student is enrolled in a course taught by his/her parent. 



Qi : ans ^enrolled(S', C, R) A teaches(P, C, A) A parent(P, S). 
The following query Q2 asks: Is there a professor who has a child enrolled in some course? 

Q 2 ■ ans ^teaches(P, C, A) A enrolled^, C , R) A parent(P, S). 

Decision problems such as the evaluation problem of Boolean CQs, the query-of-tuple problem (i.e., check- 
ing whether a given tuple belongs to a CQ), and the containment problem for CQs have been studied inten- 
sively. (For recent references, see [ p0[ ^].) These problems - which are all equivalent via simple logspace 
transformations (see [13]) - are NP-complete in the general setting but are polynomially solvable for a number 
of syntactically restricted subclasses. 



1.2 Acyclic Queries and Join Trees 

Most prominent among the polynomial cases is the class of acyclic queries or tree queries [ |32[ || |l2[ [33|, H 
[9L [K], |22h. A query Q is acyclic if its associated hypergraph H(Q) is acyclic, otherwise Q is cyclic. The 
vertices of H(Q) are the variables occurring in Q. Denote by atoms(Q) the set of atoms in the body of Q, 
and by var(A) the variables occurring in any atom A G atoms(Q). The hyperedges of H{Q) consist of all 
sets var(A), such that A G atoms(Q). We refer to the standard notion of cyclicity/acyclicity in hypergraphs 
used in database theory [ pl| , [29| , p. 

A join tree JT(Q) for a conjunctive query Q is a tree whose vertices are the atoms in the body of Q such 
that whenever the same variable X occurs in two atoms A\ and A2, then A\ and A2 are connected in JT(Q), 
and X occurs in each atom on the unique path linking A\ and A2. In other words, the set of nodes in which 
X occurs induces a (connected) subtree of JT(Q). We will refer to this condition as the Connectedness 
Condition of join trees. 

Acyclic queries can be characterized in terms of join trees: A query Q is acyclic iff it has a join tree [|3[ 

Example 1.2 While query Q\ of example [Tj] is cyclic and admits no join tree, query Q2 is acyclic. A join 
tree for Q2 is shown in Figure [IJ 

Acyclic conjunctive queries have highly desirable computational properties: 
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t(P,C,A) 



Figure 1: A join tree of Q2 



(1) The problem BCQ of evaluating a Boolean conjunctive query can be efficiently solved if the input query 
is acyclic. Yannakakis provided a (sequential) polynomial time algorithm solving BCQ on acyclic conjunc- 
tive queries Q \jjS2j. The authors of the present paper have recently shown that BCQ is highly parallelizable on 



acyclic queries, as it is complete for the low complexity class LOGCFL [13]. (2) Acyclicity is efficiently rec- 
ognizable, and a join tree of an acyclic query is efficiently computable. A linear-time algorithm for computing 
a join tree is shown in [28]; an L SL method has been provided in [13]. (3) The result of a (non-Boolean) 



acyclic conjunctive query Q can be computed in time polynomial in the combined size of the input instance 
and of the output relation [ f32| ] . 

Intuitively, the efficient behaviour of Boolean acyclic queries is due to the fact that they can be evaluated 
by processing the join tree bottom-up by performing upward semijoins, thus keeping small the size of the 
intermediate relations (that could become exponential if regular join were performed). This method is the 



Boolean version of Yannakakis evaluation algorithm for general conjunctive queries p2[]. 

Acyclicity is a key-property responsible for the polynomial solvability of problems that are in general NP- 
hard such as BCQ [^] and other equivalent problems such as Conjunctive Query Containment fl23|, f7|], Clause 



Subsumption, and Constraint Satisfaction [20, 13]. (For a survey and detailed treatment see [1131 



1.3 Queries of Bounded Width 

The tremendous speed-up obtainable in the evaluation of acyclic queries stimulated several research efforts 
towards the identification of wider classes of queries having the same desirable properties as acyclic queries. 
These studies identified a number of relevant classes of cyclic queries which are close to acyclic queries, 
because they can be decomposed via low width decompositions to acyclic queries. The main classes of 
polynomially solvable bounded-width queries considered in database theory and in artificial intelligence are: 

• The queries of bounded treewidth [f7|] (see also [|(], [Bp). These are queries, whose variable-atom 
incidence graph has treewidth bounded by a constant.^] The treewidth of a graph is a well-known mea- 
sure of its tree-likeness introduced by Robertson and Seymour in their work on graph minors This 
notion plays a central role in algorithmic graph theory as well as in many subdisciplines of Computer 
Science. We omit a formal definition. It is well-known that checking that a graph has treewidth < k 
for a fixed constant k, and in the positive case, computing a /c-width tree decomposition is feasible in 
linear time [ft|. 



Queries of bounded degree of cyclicity [18, 17]. This is an interesting class of queries which also 
encompasses the class of acyclic queries. For space reasons, we omit a formal definition. For each 
constant k, checking whether a query has degree of cyclicity < k is feasible in polynomial time rfT3, p 



'Note that, since both the database DB and the query Q are part of an input-instance of BCQ, what we are considering is the 
combined complexity of the query [pip. 

2 As pointed out in [pc|], the notion of treewidth of a query can be equivalently based on the Gaifman graph of a query, i.e., the 
graph linking two variables by an edge if they occur together in a query-atom. 
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Figure 2: A 2-width query decomposition of query Q\ 



• Queries of bounded query- width. [Q]. This notion of bounded query- width is based on the concept of 
query decomposition [f7|]. Roughly, a query decomposition of a query Q consists of a tree each vertex of 
which is labelled by a set of atoms and/or variables. Each variable and atom induces a connected subtree 
(connectedness condition). Each atom occurs in at least one label. The width of a query decomposition 
is the maximum of the cardinalities of its vertices. The query width qw(Q) of Q is the minimum width 
over all its query decompositions. A formal definition is given in Section |3~T|; Figure || shows a 2-width 
query-decomposition for the cyclic query Q± of Example This class is the widest of the three 
classes: Each query of bounded treewidth or of bounded degree of cyclicity k has also bounded query 
width k, but for some queries the converse does not hold [f7], [13|]. There are even classes of queries 
with bounded query width but unbounded treewidth. Note, however, that no polynomial algorithm for 
checking whether a query has width < k was known. 

All these concepts are true generalizations of the basic concept of acyclicity. For example, a query is acyclic 
iff it has query width 1 . 

Intuitively, a vertex of a /c-width query decomposition stands for the natural join of (the relations of) its 
elements - the size of this join is 0(n k ), where n is the size of the input database. Once these joins have 
been done, the query decomposition can be treated exactly like a join tree of an acyclic query, and permits to 
evaluate the query in time polynomial in n k [0]. 

The problem BCQ (evaluation of Boolean conjunctive queries) and the bounded query-width versions of all 
mentioned equivalent problems, e.g. query-containment Q\ C Q 2 , where the query width of Q2 is bounded, 
can be efficiently solved if a k-width query decomposition of the query is given as (additional) input. Chekuri 
and Rajamaran provided a polynomial time algorithm for this problem [Q]; Gottlob et al. [13] later pinpointed 
the precise complexity of the problem by proving it LOGCFL-complete. 



1.4 A negative Result 

Unfortunately, unlike for acyclicity or for bounded treewidth, or for bounded degree of cyclicity, no efficient 
method for checking bounded query-width is known, and a A;-width query decomposition, which is required 
for the efficient evaluation of a bounded-width query, is not known to be polynomial time computable. 

Chekuri and Rajaraman [Q] state this as an open problem. This problem is the first question we address in 
the present paper. 

The fact that treewidth k can be checked in linear time suggests that an analogous algorithm may work 
for query width, too. Chekuri and Rajaraman ^ express some optimism by writing " it would be useful to 
have an efficient algorithm that produces query decompositions of small width, analogous to the algorithm 
of Bodlaender /Q/ for decompositions of small treewidth." Kolaitis and Vardi [20] write somewhat more 
pessimistically: "there is an important advantage of the concept of bounded treewidth over the concept of 
bounded querywidth. Specifically, as seen above, the classes of structures of bounded treewidth are polynomi- 
ally recognizable, wheareas it is not known whether the same holds true for the classes of queries of bounded 
querywidth" . 

Our first main result is bad news: 
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Determining whether the query width of a conjunctive query is at most 4 is NP-complete. 



The NP-completeness proof is rather involved and will be given in Section [3^2|. We give a rough intuition in 
this section. The proof led us to a better intuition about (i) why the problem is NP-complete, and (ii) how this 
could be redressed by suitably modifying the notion of query width. Very roughly, the source of NP-hardness 
can be pinpointed as follows. In order to obtain a query-decomposition of width bounded by k, it can be 
seen that it is implicitly required to proceed as follows. At any step, the decomposition is guided by a set 
C of variables that still needs to be processed. Initially, i.e., at the root of the decomposition, C consists of 
all variables that occur in the query. We then choose as root of the decomposition tree a hypernode R of 
< k query atoms. By fixing this hypernode, we eliminate a set of variables, namely those which occur in the 
atoms of R. The remaining set of variables disintegrates into connected components. We now expand the 
decomposition tree by attaching children, and thus, in the long run, subtrees to R. It can be seen that each 
subtree rooted in R must correspond to one or more of the connected components and each component occurs 
in exactly one subtree (otherwise the connectedness condition would be violated). In particular, since each 
atom should be eventually covered, each remaining atom must be covered by some subtree, i.e., must occur in 
some subtree. This process goes on until all variables are eliminated and until all query atoms are eventually 
covered. The definition of query width requires that this covering be exact, i.e., that each atom containing a 
variable of a certain component C occurs only in the subtree corresponding to C. (Again, the requirement of 
exact covering is due to the connectedness condition.) The NP hardness is due to this requirement of exact 
covering. In our NP-completeness proof, we thus tried to reduce the problem of EXACT COVERING BY 
3 SETS to the query width problem. And this attempt was successful. 

1.5 Hypertree Decompositions: Positive Results 

To circumvent the high complexity of query-decompositions, we introduce a new concept of decomposition, 
which we call hypertree decomposition. The definition of hypertree decomposition (see Section |j) uses a more 
liberal notion of covering. When choosing a set SC of components to be processed, the variables of SC are 
no longer required to exactly coincide with the variables occurring in the labels of the decomposition-subtree 
corresponding to SC, but it is sufficient that the former be a subset of the latter. Based on this more liberal 
notion of decomposition, we define the corresponding notion of hypertree width in analogy to the concept of 
query width. 

We denote the query width of a query by qw(Q) and its hypertree width by hw(Q). We are able to prove 
the following results: 
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1. For each conjunctive query Q it holds that hw(Q) < qw(Q). 

2. There exist queries Q such that hw{Q) < qw(Q). 

3. For each fixed constant k, the problems of determining whether hw(Q) < k and of computing 
(in the positive case) a hypertree decomposition of width < k are feasible in polynomial time. 

4. For fixed k, evaluating a Boolean conjunctive query Q with hw(Q) < k is feasible in polynomial 
time. 

5. The result of a (non-Boolean) conjunctive query Q of bounded hypertree-width can be computed 
in time polynomial in the combined size of the input instance and of the output relation. 

6. Tasks 3 and 4 are not only polynomial, but are highly parallelizable. In particular, for fixed k, 
checking whether hw(Q) < k is in the parallel complexity class LOGCFL; computing a hyper- 
tree decomposition of width k (if any) is in functional LOGCFL, i.e., is feasible by a logspace 
transducer that uses an oracle in LOGCFL; evaluating Q where hw(Q) < k on a database is 
complete for LOGCFL under logspace reductions. 



Similar results hold for the equivalent problem of conjunctive query containment Qi C Q 2 , where hw{Q2) < 
k, and for all other of the aforementioned equivalent problems. 

Let us comment on these results. By statements 1 and 2, the concept of hypertree width is a proper general- 
ization of the notion of query width. By statement 3, constant hypertree-width is efficiently checkable, and by 
statement 4, queries of constant hypertree width can be efficiently evaluated. In summary, this is truly good 
news. It means that the notion of constant hypertree width not only shares the desirable properties of constant 
query-width, it also does not share the bad properties of the latter, and, in addition, is a more general concept. 

It thus turns out that the high complexity of determining constant query width is not, as one would usually 
expect, the price for the generality of the concept. Rather, it is due to some peculiarity in its definition related 
to the exact covering paradigm. In the definition of hypertree width we succeeded to eliminate these problems 
without paying any additional charge, i.e., hypertree width comes as a freebie! 

Statement 6 asserts that the main algorithmic tasks related to constant hypertree-width are in the very 
low complexity class LOGCFL. This class consists of all decision problems that are logspace reducible 
to a context-free language. An obvious example of a problem complete for LOGCFL is Greibach's hardest 
context-free language [jl^]. There is a number of very interesting natural problems known to be LOGCFL- 
complete (see, e.g. [^, 26, 13]). The relationship between LOGCFL and other well-known complexity classes 



is summarized in the following chain of inclusions: 

AC C NC 1 CLCSLCNLC LOGCFL C AC 1 C NC 2 CPCNP 

Here L denotes logspace, AC* and NC 1 are logspace-uniform classes based on the corresponding types of 
Boolean circuits, SL denotes symmetric logspace, NL denotes nondeterministic logspace, P is polynomial 
time, and NP is nondeterministic polynomial time. For the definitions of all these classes, and for references 



concerning their mutual relationships, see [ |19| ] 



Since LOGCFL C AC 1 C NC 2 , the problems in LOGCFL are all highly parallelizable. In fact, they are 
solvable in logarithmic time by a CRCW PRAM with a polynomial number of processors, or in log 2 -time by 
an EREW PRAM with a polynomial number of processors. 

1.6 Structure of the Paper 

Basic notions of database and complexity theory are given in Section 2. Section 3 deals with query decompo- 
sitions and includes the NP-completeness proof for the problem of deciding bounded query-width. The new 
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notions of hypertree decomposition and hypertree width are formally denned in Section 4, where also some 
examples are given, and it is shown that bounded hypertree-width queries are efficiently evaluable. In Section 
5, the alternating algorithm k-decomp that checks whether a query has hypertree width < k is presented. 
This algorithm is shown to run on a logspace ATM having polynomially sized accepting computation trees, 
thus the problem is actually in LOGCFL. Finally, a short sketch of a deterministic polynomial algorithm (in 
form of a datalog program) for checking whether a query has hypertree width < k is given. In Section 6 the 
notion of hypertree decomposition is shown to be the most general among the most important related notions, 
e.g., the notion of query decomposition. 

2 Preliminaries 

2.1 Databases and Queries 

For a background on databases, conjunctive queries, etc., see p9| , [XL Ell]. We define only the most relevant 
concepts here. 

A relation schema R consists of a name (name of the relation) r and a finite ordered list of attributes. To 
each attribute A of the schema, a countable domain Dom(A) of atomic values is associated. A relation 
instance (or simply, a relation) over schema R = (Ai, . .. ,Af.) is a finite subset of the cartesian product 
Dom(A\) x • • • x Dom(Ak). The elements of relations are called tuples. A database schema DS consists 
of a finite set of relation schemas. A database instance, or simply database, DB over database schema 
DS = {Ri, . . . , R m } consists of relation instances n, . . . , r m for the schemas Ri, . . . , R m , respectively, and 
a finite universe U C Ui? (A 4 a\ )£Ds(Dom{A\) U • • • U Dom(A l k )) such that all data values occurring in 
DB are from U. 

In this paper we will adopt the standard convention [jlj |29|] of identifying a relational database instance 
with a logical theory consisting of ground facts. Thus, a tuple (a\, . . . au), belonging to relation r, will be 
identified with the ground atom r(a±, . . . , afc). The fact that a tuple (a±, . . . , ak) belongs to relation r of a 
database instance DB is thus simply denoted by r(ai, . . . , a^) G DB. 

A (rule based) conjunctive query Q on a database schema DS = {R\, . . . , R m } consists of a rule of the 
form 

Q : ans(u) <— ri(ui) A • • • A r n (u n ), 

where n > 0, ri, . . . , r n are relation names (not necessarily distinct) of DS; ans is a relation name not in DS; 
and u, ui, . . . , u n are lists of terms (i.e., variables or constants) of appropriate length. The set of variables 
occurring in Q is denoted by var(Q). The set of atoms contained in the body of Q is referred to as atoms(Q). 
Similarly, for any atom A G atoms(Q), var(A) denotes the set of variables occurring in A; and for a set of 
atoms R C atoms(Q), define var(R) = [j AeR var(A). 

The answer of Q on a database instance DB with associated universe U, consists of a relation ans whose 
arity is equal to the length of u, defined as follows, ans contains all tuples ans(u)?9 such that -d : var(Q) — > 
U is a substitution replacing each variable in var(Q) by a value of U and such that for 1 < i < n, rj(tii)$ G 
DB. (For an atom A, A$ denotes the atom obtained from A by uniformly substituting "&{X) for each variable 
X occurring in A) 

The conjunctive query Q is a Boolean conjunctive query (BCQ) if its head atom ans(u) does not contain 
variables and is thus a purely propositional atom. Q evaluates to true if there exists a substitution $ such that 
for 1 < i < n, rj(ui)'# G DB; otherwise the query evaluates to false. 

The head literal in Boolean conjunctive queries is actually inessential, therefore we may omit it when 
specifying a Boolean conjunctive query. 

Note that conjunctive queries as defined here correspond to conjunctive queries in the more classical setting 
of relational calculus, as well as to SELECT-PROJECT- JOIN queries in the classical setting of relational 
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algebra, or to simple SQL queries of the type 



SELECT R il .A h ,... R ik .A jk FROM R 1} . . . R n WHERE cond, 



such that cond is a conjunction of conditions of the form R^.A = Rj.B or Ri.A = c, where c is a constant. 

A query Q is acyclic ||] if its associated hypergraph H(Q) is acyclic, otherwise Q is cyclic. The vertices 
of H(Q) are the variables occurring in Q. Denote by atoms(Q) the set of atoms in the body of Q, and by 
var(A) the variables occurring in any atom A S atoms(Q). The hyperedges of H(Q) consist of all sets 
var(A), such that A € atoms(Q). We refer to the standard notion of cyclicity/acyclicity in hypergraphs used 
in database theory ^ ||, |]. 

A join tree JT(Q) for a conjunctive query Q is a tree whose vertices are the atoms in the body of Q such 
that whenever the same variable X occurs in two atoms A\ and A%, then A\ and A2 are connected in JT(Q), 
and X occurs in each atom on the unique path linking Ai and A2. In other words, the set of nodes in which 
X occurs induces a (connected) subtree of JT(Q) {connectedness condition). 

Acyclic queries can be characterized in terms of join trees: A query Q is acyclic iff it has a join tree 



Example 2.1 While query Q\ of example |L1] is cyclic and admits no join tree, query Q2 is acyclic. A join 
tree for Q2 is shown in Figure [jj 
Consider the following query Q3. 



ans 



r(Y, Z) A g{X, Y) A s(Y, Z, U) A s(Z, U, W) A t{Y, Z) A t(Z, U) 
A join tree for Q3 is shown in Figure 0. 



r(Y,Z) 



g(X,Y) 



s(Y,Z,U) 



s(Z,U,W) 



t(Y,Z) 



t(Z,U) 



Figure 3: A join tree of Q3 



Acyclic conjunctive queries have highly desirable computational properties: 

1. The problem BCQ of evaluating a Boolean conjunctive query can be efficiently solved if the input query 
is acyclic. Yannakakis provided a (sequential) polynomial time algorithm solving BCQ on acyclic 
conjunctive queries [|32|]. The authors of the present paper have recently shown that BCQ is highly 
parallelizable on acyclic queries, as it is complete for the low complexity class LOGCFL Jl3| ] . 

2. Acyclicity is efficiently recognizable, and a join tree of an acyclic query is efficiently computable. A 
linear-time algorithm for computing a join tree is shown in [^] ; an L SL method has been provided in 



[13]. 



3. The result of a (non-Boolean) acyclic conjunctive query Q can be computed in time polynomial in the 



combined size of the input instance and of the output relation [32] 



3 Note that, since both the database DB and the query Q are part of an input-instance of BCQ, what we are considering is the 
combined complexity of the query jpT[|. 
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Intuitively, the efficient behaviour of Boolean acyclic queries is due to the fact that they can be evaluated 
by processing the join tree bottom-up by performing upward semijoins, thus keeping small the size of the 
intermediate relations (that could become exponential if regular join were performed). This method is the 



Boolean version of Yannakakis evaluation algorithm for general conjunctive queries [|32|]. 

Acyclicity is a key-property responsible for the polynomial solvability of problems that are in general NP- 
hard such as BCQ [^] and other equivalent problems such as Conjunctive Query Containment p3|,[7[|, Clause 
Subsumption, and Constraint Satisfaction [2(| 13]. (For a survey and detailed treatment see [13].) 

2.2 The class LOGCFL 

LOGCFL consists of all decision problems that are logspace reducible to a context-free language. An obvious 



example of a problem complete for LOGCFL is Greibach's hardest context-free language Jig]. There are a 



number of very interesting natural problems known to be LOGCFL-complete (see, e.g. [ ]13| , |27| , [2q]). The 
relationship between LOGCFL and other well-known complexity classes is summarized in the following 
chain of inclusions: 

AC C NC 1 CLCSLCNLC LOGCFL C AC 1 C NC 2 CPCNP 

Here L denotes logspace, AC* and NC 1 are logspace-unifonn classes based on the corresponding types of 
Boolean circuits, SL denotes symmetric logspace, NL denotes nondeterministic logspace, P is polynomial 
time, and NP is nondeterministic polynomial time. For the definitions of all these classes, and for references 



concerning their mutual relationships, see [ |19| ] 



Since - as mentioned in the introduction - LOGCFL C AC 1 C NC 2 , the problems in LOGCFL are all 
highly parallelizable. In fact, they are solvable in logarithmic time by a CRCW PRAM with a polynomial 
number of processors, or in log 2 -time by an EREW PRAM with a polynomial number of processors. 

In this paper, we will use an important characterization of LOGCFL by Alternating Turing Machines. We 
assume that the reader is familiar with the alternating Turing machine (ATM) computational model introduced 
by Chandra, Kozen, and Stockmeyer [ft]. Here we assume w.l.o.g. that the states of an ATM are partitioned 
into existential and universal states. 



As in [25], we define a computation tree of an ATM M on a input string wasa tree whose nodes are labeled 
with configurations of M on w, such that the descendants of any non-leaf labeled by a universal (existential) 
configuration include all (resp. one) of the successors of that configuration. A computation tree is accepting 
if the root is labeled with the initial configuration, and all the leaves are accepting configurations. 

Thus, an accepting tree yields a certificate that the input is accepted. A complexity measure considered 



by Ruzzo [ |25| ] for the alternating Turing machine is the tree-size, i.e. the minimal size of an accepting 



computation tree. 



Definition 2.2 (Q25[|) A decision problem V is solved by an alternating Turing machine M within simulta- 
neous tree-size and space bounds Z(n) and S(n) if, for every "yes" instance w ofV, there is at least one 
accepting computation tree for M on w of size (number of nodes) < Z(n), each node of which represents a 
configuration using space < S(n), where n is the size of w. (Further, for any "no" instance wofV there is 
no accepting computation tree for M.) 



Ruzzo [25] proved the following important characterization of LOGCFL : 



Proposition 2.3 (Ruzzo []25|]) LOGCFL coincides with the class of all decision problems recognized by ATMs 
operating simultaneously in tree-size 0(n ^') and space O(logn). 



3 Query Decompositions 
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s(Z,W,X) s(Y,Z,U) 
Figure 4: A 2-width query decomposition of query Q4 



3.1 Bounded Query Width and Bounded Query Decompositions 

The following definition of query decomposition is a slight modification of the original definition given by 
Chekuri and Rajaraman [f7]]. Our definition is a bit more liberal because, for any conjunctive query Q, we do 
not take care of the atom head(Q), as well as of the constants possibly occurring in Q. However, in this paper, 
we will only deal with Boolean conjunctive queries without constants, for which the two notions coincide. 

Definition 3.1 A query decomposition of a conjunctive query Q is a pair (T, A), where T = (N, E) is a tree, 
and A is a labeling function which associates to each vertex p G N a set A(p) C (atoms{Q) U var{Q)), such 
that the following conditions are satisfied: 

1. for each atom A of Q, there exists p G N such that A £ A(p); 

2. for each atom A of Q, the set {p G N \ A G \(p)} induces a (connected) subtree of T; 

3. for each variable Y G var{Q), the set 

{p G N I Y G X(p)} U {p G N \ Y occurs in some atom A G A(p)} 

induces a (connected) subtree of T. 

The width of the query decomposition (T, A) is max pe 7v|A(p)|. The query width qw(Q) of Q is the mini- 
mum width over all its query decompositions. A query decomposition for Q is pure if, for each vertex p G N, 

A(p) C atoms(Q). 

Note that Condition 3 above is the analogue of the connectedness condition of join trees and thus we will refer 
to it as the Connectedness Condition, as well. 

Example 3.2 Figure || shows a 2-width query decomposition for the cyclic query of Example 
Consider the following query Q4. 

ans <- s{Y, Z, U) A g(X, Y) A t(Z, X) A s{Z, W, X) A t(Y, Z) 

Q4 is a cyclic query, and its query width equals 2. A 2-width decomposition of Q4 is shown in Figure |j. Note 
that this query decomposition is pure. 

The next proposition, which is proved elsewhere [|3|], shows that we can focus our attention on pure query 
decompositions. 

Proposition 3.3 ([]I3[]) Let Q be a conjunctive query and (T, A) a c-width query decomposition of Q. Then, 
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1. there exists a pure c-width query decomposition (T, A') of Q; 



2. (T, A') is logspace computable from (T, A). 



Thus, by Proposition |3.3| , for any conjunctive query Q, qw(Q) < k if and only if Q has a pure c-width 
decomposition, for some c < k. 

k-bounded-width queries are queries whose query width is bounded by a fixed constant k > 0. The notion 
of bounded query- width generalizes the notion of acyclicity [|7p. Indeed, acyclic queries are exactly the 
conjunctive queries of query width 1, because any join tree is a query decomposition of width 1. 

Bounded-width queries share an important computational property with acyclic queries: BCQ can be effi- 
ciently solved on queries of k-bounded query-width, if a k-width query decomposition of the query is given as 
(additional) input. Chekuri and Rajamaran provided a polynomial time algorithm for this problem [J7]|; while 
Gottlob et al. pinpointed that the precise complexity of the problem is LOGCFL. 

Unfortunately, different from acyclicity, no efficient method for checking bounded query-width is known. 
In fact, we next prove that deciding whether a conjunctive query has a bounded- width query decomposition 
is NP-complete. 



3.2 Recognizing bounded query-width is NP-complete 

A k-element-vertex of a query decomposition (T, A) is a vertex v of T such that \\(v)\ = k. 
Lemma 3.4 Let Q be a query having variable set var(Q) = T U Rest, where 

Y = {V ij \l<i<j< 8}, 

and Rest is an arbitrary set of further variables. Assume the set atoms(Q) contains as subset a set U = 
{Pi, ... , P 8 } of 8 atoms, where, for 1 < % < 8, var(Pi) n Y = {Vu, V 2i , Vi-n, V ii+ i,..., V i8 }, i.e., 

var(P l )nT= {J{V ki } U ]J{V lk }. 

k<i i<k 

andMA G atoms(Q) - II : var(A) n Y = 0. 

If Q admits a pure query decomposition (T, A) of width 4, then there exist two adjacent ^-element-vertices 
pi andp2 ofT such that X(pi) U X(p2) = n. 

Proof. Assume this doesn't hold. Let v be any vertex from T such that Y n var(v) ^ 0. Let R = Yl — X(v). 
The atoms of R must occur somewhere in the tree T. By the connectedness condition, all atoms of R must 
occur in the labels of some neighbours of v. Moreover, by our assumption, the atoms of R are not contained 
in the label of a single neighbour of v. Thus, there exist two neighbours v±,V2 of v and two different atoms 
PiPj € R such that Pj G \{v\) — \{v2) and Pj G A(i>2) — A(vi). Assume, w.l.o.g., that % < j. Then 
Vij G var(vi), Vij G var(v2), but Vij var(v). This, however, violates the connectedness condition. 
Contradiction. | 



Definition 3.5 Let S be a set of n elements. A 3-partition {S a , Sb, S c } of S consists of three nonempty 
subsets S a , Sb, S c C S such that S a U Sb U S c = S, and S x n S y = for x / y from {a, b, c}. The sets 
S a ,Sb, S c are referred to as classes. 
A 3-Partitioning-System (short 3PS) S on a base set S is a set of 3-partitions of S: 

r rcl q1 cl\ J" q2 q2 q2\ / qtti om nmi 1 

^ — i I'-'a' ' c Ji I'-'a ) ^6 ) '-'c /!•••! l"a j °fe j °c J J i 

where Vcr, <r' G S: a ^ a' cr n a' = (i.e., no class occurs in two or more elements of E). 
We define cZasses(S) := (Jo-es °"- The base set S 1 of E is referred-to as 6ase(S): 6ase(E) = UcecZasses(E) ( -^- 
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A 3PS is strictif for all S', S", S'" G classes^) either {S\ S", S'"} = a for some a G £ or 5'U5"U5 W C 
S. In other words: the only way to obtain S as a union of three classes is via the specified 3-partitions of £; 
any other union of three classes results in a proper subset of S. 

A 3PS E is referred to as an (m, k)-3PS if |S| > m and VC G dasses(S) : |C| > fc. 



Lemma 3.6 For eac/j m > ancf A; > a strict (m, k)-3PS can be computed in polynomial time. 

Proof. Fix to and k. We will construct a set S of the desired cardinality n < 27m 3 + 2m + 3fe such that 
there exists an (to, A;)-3PS for S. 

In order to construct S, first start with a set So of 2m elements. It is a trivial combinatorial fact that we can 
choose at least to different 3-partitions of So (recall that to > 3, thus the actual number of 3-partitions we 
can build is much higher). Thus, let So be a (not necessarily strict) 3PS on base set So, such that |Eo| = to. 
We now basically have to achieve two goals: (1) we have to transform So to a strict 3PS, and, (2) we have to 
make sure that all classes have at least k elements. 

In order to achieve goal (1), let New be a set of 27m 3 fresh elements. Enumerate (e.g., in a FOR loop) all 
combinations of three sets S* , S y , S z G classes(Eo) and check for each such triplet, whether it violates the 
strictness-condition. If so, choose a fresh (i.e., so far unused) element a from New, insert it into So and use 
it as follows to redress the situation: For any set a G So in which neither of S l x ,S y , Sf occurs, insert a into 
exactly one of the three classes (which one is irrelevant). This means that the so augmented a is a partition 
of the new So- On the other hand, for each a G So such that a D {S x , S y , Sf } / 0, choose a class C G a 
such that C G" {S* , S y , S e z } and insert a into C. It follows that the triplet S l x , S y , Sf no longer violates the 
strictness condition w.r.t. the new set So (containing a), because a is not in the union S* U Sy U Sf , and thus 
this union is a strict subset of So- Note that this operation does never invalidate the strictness condition of 
any other triplet. After we have repeated this procedure for each triplet, we end-up with a strict 3PS for the 
resulting set So- Call this resulting set S + and denote the resulting strict 3PS by S + . Retain that the set S + 
has less than 2m + (3m) 3 = 2m + 27m 3 elements. 

In order to achieve goal (2), we simply add a set New 1 of 3k further fresh elements to S + , obtaining 
S* = S + U New'. We furthermore partition New' into three sets 0\, O2, and O3 of equal cardinality k and 
do the following for each 3-partition a = {C a , Cb, C c } of S + , where {C a , Cb, C c ) is any arbitrarily chosen 
order of the elements of a: perform C a := C a U 0\ and C5 := Cb U O2 and C c := C c U O3. 

The resulting 3PS S is strict, has |S| = to, |6ase(S)| < 27m 3 + 2m + 3k, and each class C of classes^) 
has \C\ > k. We are done. Note that our method for generating a strict 3PS works in polynomial time. | 



Theorem 3.7 Deciding whether the query width of a conjunctive query is at most 4 is NF '-complete. 

Proof. 1. Membership. It is easy to see that if there exists a query decomposition of width bounded by 
4, then there also exists one of polynomial size (in fact, by a simple restructuring technique we can always 
remove identically labeled vertices from a decomposition tree, and thus for any conjunctive query Q only 
0(\atoms(Q) U var(Q)\ A ) need to be considered). Therefore, a query decomposition of width < k can be 
found by a nondeterministic guess followed by a polynomial correctness check. The problem is thus in NP. 

2. Hardness. We transform the well-known NP-complete problem EXACT COVER BY 3-SETS (XC3C)[|ll]] 
to the problem of deciding whether, for a conjunctive query Q, qw(Q) < 4 holds. An instance of EXACT 
COVER BY 3-SETS consists of a pair / = (R, A) where R is a set of r = 3s elements, and A is a collection 
of to 3-element subsets of R. The question is whether we can select s subsets out of A such that they form a 
partition of R. 

Consider an instance / = (R, A) of XC3S. Let A = {Di\l < i < m} and let D, = {X l a , X%, Xj} for 
1 < i < to (note that for i ^ j, some X l a and Yl may coincide). 



Generate a strict (to + 1, 2) 3PS S = {do, o~\, ■ ■ ■ , a m } on some base set S = base(T,). By Lemma 3.6 
this can be done in polynomial time. Let crj = {S* , S % b , S*} for < i < m. 
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Identify each element of S with a separate variable and establish a fixed precedence order -< among the 
elements (variables) of S. If S' is a subset of S, and S' = {Zi, . . . , Z{\, where Z\ -< Z2 ■ ■ ■ -< Z\, then 
we will abbreviate the list of variables Z\, . . . , Z\ by S' in query atoms. For example, instead of writing 
p(a, Zi,...,Zi, b), we write p(a, S', b). 

In order to transform the given instance I = (R, A) of XC3S to a conjunctive query Q, let us first define 
the following sets of variables T e and Ilf , which are all taken to be disjoint from the variables in S. 

For < £ < s, let 

F e = {Vf j \l<i< j<8}, 

and for (1 < i < 8), let 

tj£ _ fyi yi yi yi yl 1 

LL i — \ v lii v 2ii ■ ■ ■ > v i-lu v ii+li •••■> v i8J- 

Let S' a and S" be two nonempty sets which partition 5°. (Such a partition exists because S® contains at 
least two elements.) 
Define, for < £ < s the following sets of query atoms: 

BLOCK A 1 = {q(U{,S' a , Z t ), Pa (ui S%), p 6 (n|, S° b ), p c (U{, S° c )} 

BLOCKB e = {q(Ui S' a , Y e ), p a (ul S^), Pb (U £ 7 , S° b ), p c (U e 8 , S° c )}, 

where the Yji and Z^ variables are distinct fresh variables not occurring in any previously defined set. We 
further define 

BLOCKSA = \J BLOCKA 1 , BLOCKSB = \J BLOCKB 1 , 

o<e<s o<e<s 

and BLOCKS = BLOCKSA U BLOCKSB. 

Define, for 1 < £ < s: 

LINKe = {link(Ye-i,Z e )} and LINKS = \J LINK?. 

i<e<s 

Finally, define for each set Di = {X l a , X\, X l c ] of A, 1 < i < m, the set of atoms: 

n[D i ] = {s(xi,s i a ),s(xlsi),s(xi,si)}. 

Let = Ui<j< m ^[-^*]' an( ^ denote by 0,(Di) the set of all atoms of ft in which some variable of Di 
occurs, i.e, 

Q(Di) = {s(X, a) € n I X e Di}. 
Let Q be the query whose atom-set is BLOCKS U LINKS U fl. 

We claim that Q has query width 4 iff 7 = (R, A) is a positive instance of EXACT COVER BY 3SETS. 

Let us first prove the if part. Assume that there exist s 3-sets D 1 , . . . , D s G A which exactly cover R, i.e., 
which form a partition of R. We describe a query-decomposition (T, A) of Q. 

The root v a Q of T is labeled by the set of atoms BLOCK A . The root has as unique child a vertex v b o 
labeled by BLOCKB . 

The decomposition tree is continued as follows. For each 1 < £ < s, do the following. 

• Create a vertex v c £ labeled by LINKi U and attach v c £ as a child to vw-\- 

• For each remaining atom A of Q(D e ), we create a new vertex, label it with {^4}, and attach it as a leaf 
to v c i. (Note that these remaining atoms, if any, stem from other elements of A, given that a variable 
may occur in several 3-sets.) 
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• Then, create a vertex v a £ of T, label it by the set of atoms BLOCKA , and attach it as a child of v c £. 
The vertex V a £, in turn, has as only child a vertex vu labeled by BLOCKB e . 

It is not hard to check that (T, A) is indeed a valid query decomposition. 
Let us now prove the only-if part. Assume (T, A) is a width 4 query decomposition of the above defined 



query Q. By Proposition 3.3, we also assume, w.l.o.g., that (T, A) is a pure query decomposition. Since Q is 
connected, also T is connected. 
We observe a number of relevant facts and make some assumptions. 

FACT 1: By Lemma ^ for each < £ < s, there must exist adjacent vertices v a £ and vu such that 

\{v al ) U X(v M ) = BLOCKA 1 U BLOCKB 1 . 

FACT 2: It holds that S C var(v a g) and S C var{vu)- In fact, if this were not the case, then both vertices 
would miss variables from S, but since all variables of S occur together in other pairs of adjacent 
vertices, this would violate the connectedness condition and is thus impossible. 

FACT 3: From the latter, and from the fact that the sets S' a , S", S®, and 5° form a partition of S, it follows 
that each of the vertices v a g and vu contains a q atom, a p a atom, a p^ atom, and a p c atom. Without 
loss of generality, we can thus make the following assumption. 

ASSUMPTION: For < I < s we have Z t G var(v ae ) and Y t G var(v M ). 

FACT 4: For 1 < I < s, there exists a vertex v c £ that lies on the unique path from vu-i to v a £ such that 
{Y^-i , Z^} C var(vd). This can be seen as follows. For any variable a -d-path is a path 7r in T such 
that the variable d occurs in the label X(v) of any vertex v of 7r. The atom link(Yt_i, Z£) must belong 
to the set A(u^) of some vertex of T. Clearly, by the connectedness condition, v' c£ is connected via 
an l^-i-path -k^ to vu-i and by an Z^-path 7r a to u a ^. Let n denote the unique path from vu-\ to v a £. 
Then n, ir a , and 7T{, intersect at exactly one vertex. This is the desired vertex v^. 

FACT 5: For 1 < £ < s, S C var{vd). Trivial, because Vd lies on a path from vu-i to and 5 C 
var(vb£-i) and S C var(v a i). The fact follows by the connectedness condition. 

FACT 6: For 1 < I < s link(Y£-x, Zi) belongs to \{v c £) and there exists an i with 1 < i < m such that 
Q[Di] C A(fcf); in summary, A(vc^) = {link(Y£_i, Zf)} U 0[Z)j]. Let us prove this. By FACT 5 we 
know that all variables in S must be covered by v^. However, it also holds that ^} C var^v,^) 

(see FACT 4). To cover the latter variables, there are two alternative choices: 

1. both atoms (/(rig" 1 , S' a , Y^-i) and q(H[, S' a , Zg) belong to A(w rf ); or 

2. the atom link{Yf-\, Zg) belongs to \{v^). 

Choice 1 is impossible: there exist no two other atoms A, B G atoms(Q) such that var (A) U var (B) U 
S' a = S. We are thus left with Choice 2. Since the atom link(Yi-i, Zg) does not contain any variable 
from S, there must be three other atoms in X(v c i) that together cover S. An inspection of the available 
atoms shows that the only possibility of covering S by three atoms is via some atom set Q[Di\ for 
1 < i < m. The fact is proved. 

FACT 7: For 1 < i < j < s it holds that v a i lies on the unique path in T from v c i to v c j. 

Consider the edge {v a i, Vu}. If we cut this edge from the tree T, then we obtain two disconnected trees 
T a (containing v a i) and T5 (containing vu). Since v c i is connected via a Zj-path to v a i, but Zi does not 
occur in var{vu), it holds that v c i is contained in T a . On the other hand, by "iterative" application of 
Fact 4 and of the connectedness condition it follows that there is a path ir from v^i to v c j such that for 
each vertex v of tt it holds that var(v) n Bigvars 7^ 0, where Bigvars = {Y^ \ i < h < j} U {Z^ | i < 
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h < j}. Since var{v a i) n Bigvars = 0, n does not traverse v a i. It follows that v c i belongs to T^. 
Therefore, the unique path linking v c i to v c j goes through the edge {v a i,vin}, and thus contains the 
vertex v ai . 

FACT 8: For < i < j < s it holds that var(v c i) n var(v c j) = S. By Fact 7, v a i lies on the unique path 
from v c i to v c j. Therefore by the connectedness condition it holds that var(v c j) r\var(v c j) C var(v a i). 
Moreover, by Fact 6, no variable from var(v a i) — S is contained in both var{v c i) and var{v c j). Thus 
far(f c j) n var(v c j) C 5. On the other hand, by Fact 5, S 1 C mr(t> c j) and 5 C var(v c j), hence, 
5 C war(w c j) n var(v c j). In summary, we obtain var(v c i) n var(v c j) = S. 

For each 1 < £ < s, denote by the set D t such that fi[A] C A(v^) (see Fact 6). By FACT 8 it follows 
that the sets D e (1 < I < s) are mutually disjoint. But then the union of these sets is of cardinality 3s = r, 
and hence the union must coincide with R. Thus s subsets out of A cover R and (R, A) is a positive instance 
of EXACT COVER BY 3-SETS. | 



4 Conjunctive Queries of Bounded Hypertree-Width 

4.1 Hypertree Width 

Let Q be a (conjunctive) query. A hypertree for Q is a triple (T, x, A), where T = (N, E) is a rooted tree, 
and x an d A are labeling functions which associate to each vertex p € N two sets x(p) ^ var(Q) and 
X(p) C atoms(Q). If T" = (A/"', £") is a subtree of T, we define x(T') = \J v eN> x(v)- We denote the set 
of vertices of T by vertices(T), and the root of T by root(T). Moreover, for any p £ N,T P denotes the 
subtree of T rooted at p. 

Definition 4.1 A hypertree decomposition of a conjunctive query Q is a hypertree (T, x, A) for Q which 
satisfies all the following conditions: 

1. for each atom A € atoms(Q), there exists p G vertices(T) such that uar(^l) C 

2. for each variable Y G var(Q), the set {p G vertices(T) \ Y G induces a (connected) subtree of 
T; 

3. for each vertex p G vertices(T), x{p) Q var(\(p)); 

4. for each vertex p G vertices(T), var(\(p)) n x(^p) xO 7 )- 

A hypertree decomposition (T, x, A) of Q is a complete decomposition of Q if, for each atom A G atoms(Q), 
there exists p G vertices(T) such that var(^4) C x(p) and ^4 G X(p). 

The w/Jf/i of the hypertree decomposition (T, x, A) is rnax pevertices ^\X(p)\. The hypertree width hw(Q) 
of Q is the minimum width over all its hypertree decompositions. 

In analogy to join trees and query decompositions, we will refer to Condition 2 above as the Connectedness 
Condition. Note that, by Condition 1, xOH = var(Q). Hence Condition 4 entails that, for sq = root(T), 
var(X(s )) = xOo). 

Intuitively, the x labeling selects the set of variables to be fixed in order to split the cycles and achieve 
acyclicity; X(p) "covers" the variables of x{p) by a set of atoms. Thus, the relations associated to the atoms 
of X(p) restrict the range of the variables of x{p)- F° r the evaluation of query Q, each vertex p of the 
decomposition is replaced by a new atom whose associated database relation is the projection on x{p) of the 
join of the relations in X(p). This way, we obtain a join tree JT of an acyclic query Q' over database DB' 
of size 0(n k ), where n is the input size and k is the width of the hypertree decomposition. All the efficient 
techniques available for acyclic queries can be then employed for the evaluation of Q'. 
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More technically, Condition 1 and Condition 2 above extend the notion of tree decomposition [24] from 
graphs to hypergraphs (the hypergraph of a query Q groups the variables of the same atom in one hyperedge 
01). Thus, the pair (T, x) of a hypertree decomposition (T, x, A) of a conjunctive query Q, can be seen as 
the correspondent of a tree decomposition on the query hypergraph. However, the treewidth of (T, x) (i-e., 
the maximum cardinality of the ^-labels of the vertices of T) is not an appropriate measure of the width of 
the hypertree decomposition, because a set of m variables appearing in the same atom should count 1 rather 
than m for the width. Thus, X(p) provides a set of atoms which "covers" x(p) and its cardinality gives the 
measure of the width of vertex p. It is worthwhile noting that (T, A) may violate the classical connectedness 
condition usually imposed on the variables of the join trees, as it is allowed that a variable X appears in both 
X(p) and X(q) while it does not appear in X(s), for some vertex s on the path from p to q in T. However, 
this violation is not a problem, as the variables in var(X(p)) — x(p) are meaningless and can be projected out 
before starting the query evaluation process, because the role of X(p) is just that of providing a binding for 
the variables of x(p)- 



{J,X,Y,X',Y'} {j} 



{X, X',Y, Y',X ab ,X ac , X af ,X bc , X bf ] {a, b} 



{P,S,C,A\ {p,t} 







{S,C,R} {e} 



(a) 






{X,Z} {d} 




{Y,Z} {e} 




{X'.Z'} {g} 




{Y'.Z'} {h} 



(b) 



Figure 5: A 2-width hypertree decomposition of (a) query Q\\ and (b) query Q5 



Example 4.2 The hypertree width of the cyclic query Q\ of Example 1.1 is 2; a (complete) 2-width hypertree 
decomposition of Q\ is shown in Figure |[a. 
Consider the following conjunctive query Q$: 

ans <- a(X ab , X, X' , X ac , X af ) A b(X ab , Y, Y\ X bc , X bf ) A c(X ac , X bc , Z) A d(X, Z) A e(Y, Z)A 
Af{X af , X bf , Z>) A g(X>, Z') A h(Y', Z>) A j(J, X, Y, X' , Y') 

Q5 is clearly cyclic, and thus hw{Q^) > 1 (as only acyclic queries have hypertree width equals 1). Figure ||] 
shows a (complete) hypertree decomposition of Q5 having width 2, hence hw(Q^) = 2. 



Definition ^Jj does not require the presence of all query atoms in a decomposition HD, as it is sufficient 
that every atom is "covered" by some vertex p of HD (i.e., its variables are included in x{p))- However, every 
missing atom can be easily added to complete decompositions. 

Lemma 4.3 Given a conjunctive query Q, every k-width hypertree decomposition of Q can be transformed 
in Logspace into a k-width complete hypertree decomposition of Q. 

Proof. Let Q be a conjunctive query and HD = (T, x, A) a hypertree decomposition of Q. In order to 
transform HD into a complete decomposition, modify HD as follows. For each atom A G atoms(Q) such 
that no vertex q G vertices(T) satisfies var(A) C x(q) and A G X(q), create a new vertex va with X(va) '■= 
{A} and x( v a) = var(A), and attach va as a new child of a vertex p G vertex(T) s.t. var(A) C x(p)- (By 
Condition 1 of Definition ^1] such a p must exist.) 
This transformation is obviously feasible in Logspace. | 

The acyclic queries are precisely the queries of hypertree width one. 
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Theorem 4.4 A conjunctive query Q is acyclic if and only ifhw(Q) = 1. 

Proof. (Only if part.) If Q is an acyclic query, there exists a join tree JT(Q) for Q. Let T be a tree, and 
/ a bijection from vertices(JT(Q)) to vertices(T) such that, for any p,q G vertices(JT(Q)), there is an 
edge between p and q in JT(Q) if and only if there is an edge between f(p) and f(q) in T. Moreover, let 
A be the following labeling function: If p is a vertex of JT(Q) and ^4 is the atom of Q associated to p, then 
^(f(p)) = {^}- F° r an y vertex p' G vertices(T) define x(p') = var(X(p')). Then, (T, x,A) is clearly a 
width 1 hypertree-decomposition of Q. 

(If part.) Let iJD = (T, x, A) be a width 1 hypertree-decomposition of Q. W.l.o.g., assume that HD is a 
complete hypertree decomposition. Since HD has width 1, all the A labels are singletons, i.e., A associate one 
atom of Q to each vertex of T. 

We next show how to trasform HD into a width 1 complete hypertree decomposition of Q such that, for 
any vertex p G vertices(T), x{p) = var(A), where {^4} = A(p), and p is the unique vertex labeled with the 
atom A. 

Choose any total ordering -< of the vertices of T. For any atom A G atoms(Q), denote by v (A) the -<; -least 
vertex of T such that x( v (A)) = var(A) and \(v(A)) = {A}. The existence of such a vertex is guaranteed 
by definition of complete hypertree decomposition and by the hypothesis that every A label consists of exactly 
one atom. 

For any atom A G atoms(Q), and for any vertex p ^ v(A) such that X(p) = {A}, perform the following 
actions. For any child p' of p, delete the edge between p and p' and let p' be a new child of v (A), hence 
the subtree T p / is now attached to v(A). Then, delete vertex p. By Condition 3 of Definition p~T| , x{p) 
var(\(p)). Since var(\(p)) = var(A) = x{v(A)), we get x(p) ^ xi v {A)). Then, it is easy to see that the 
(transformed) tree T satisfies the connectedness condition. 

Eventually, we obtain a new hypertree H' = (T", x, A) such that vertices(T') C vertices(T) and H' has 
the following properties: (i) for any A G atoms(Q), there exists exactly one vertex p = v(A) of T' such that 
\(p) = {A} and xip) = var(A); (ii) for any vertex p of T', p = v(A) holds, for some A G atoms(Q); (iii) 
H' satisfies the connectedness condition. Thus, H' clearly corresponds to a join tree of Q. | 



4.2 Efficient Query Evaluation 

Lemma 4.5 Let Q be a Boolean conjunctive query over a database DB, and HD = (T, x, A) a hypertree 
decomposition ofQ of width k. Then, there exists Q', DB', JT such that: 

1. Q' is an acyclic (Boolean) conjunctive query answering 'yes' on database DB' iff the answer of Q on 
DB is 'yes'. 

2. \\{Q',DB',jr)\\ =0(\\(Q,VB,HD)\\ k ). 

3. JT is a join tree of the query Q'. 

4. (Q' , DB', JT) is logspace computable from (Q, DB, HD). 

Proof. Let Q be a Boolean conjunctive query over a database DB, and HD = (T, x, A) a hypertree decom- 
position of Q of width k. From Lemma 1.3, we can assume that HD = (T, x, A) is a complete decomposition 
of Q. W.l.o.g., we also assume Q does not contain any atom A such that var(A) = 0. 

Note that Q evaluates to true on DB if and only if ^A^atoms(Q) rel(A) is a non-empty relation, where 
rel(A) denotes the relation of DB associated to the atom A, and N is the natural join operation (with common 
variables acting as join attributes). 

For each vertex p G vertices(T) define a query Q(p) and a database DB(p) as follows. For each atom 
Ae\(p): 

• If var(A) C x(p)> then A occurs in Q(p) and rel(A) belongs to DB(p); 
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• if (var(A) n x{p)) tnen Q(p) contains a new atom A' such that var(A') = var(A) n x(p)> 

and DB(p) contains the corresponding relation rel(A'), which is the projection of rel(A) on the set of 
attributes corresponding to the variables in var(A'); 

Now, consider the following query Q on the database DB = {J p£ver u ce s(T) DB(p). 

Q ■■ A Q(p) 

p£vertices(T) 

By the associative and commutative properties of natural joins, and by the fact that HD is a complete 
hypertree decomposition, it immediately follows that Q on DB is equivalent to Q on DB. 

We build (Q',DB', JT) as follows. JT has exactly the same tree shape of T. For each vertex p, there is 
precisely one vertex p' in JT, and one relation P' in DB'. p' is an atom having xip) as arguments and its 
corresponding relation P' in DB' is the result of the query Q(p) on DB(p). Q' is the conjunction of all vertices 
(atoms) of JT. Q' on DB' is clearly equivalent to Q on DB, and JT is a join tree of Q'. Moreover, ||DB'|| = 
0(||DB|| fc ), \\JT\\ = 0(\\HD\\) and ||Q'|| = 0(\\HD\\); thus, ||(Q',DB', JT)\\ = 0(\\(Q,BB, HD)\\ k ). 

The transformation is clearly feasible in Logspace. | 



Theorem 4.6 Given a database DB, a Boolean conjunctive query Q, and a k-width hypertree decomposition 
of Q for a fixed constant k > 0, deciding whether Q evaluates to true on DB is LOGCFL-complete. 



Theorem 4.7 Given a database DB, a ( non-Boolean ) conjunctive query Q, and a k-width hypertree decom- 
position of Q for a fixed constant k > 0, the answer of Q on DB can be computed in time polynomial in the 
combined size of the input instance and of the output relation. 

Remark. In this section we demonstrated that k-bounded hypertree-width queries are efficiently computable, 



once a k-width hypertree decomposition of the query is given as (additional) input. In Section q^2|, we will 
strenghten these results showing that providing the hypertree decomposition in input is unnecessary, as, dif- 
ferent from query decompositions, a hypertree decomposition can be computed very efficiently (in L LOGCFL , 
i.e., in functional LOGCFL). 



5 Bounded Hypertree Decompositions are Efficiently Computable 
5.1 Normal form 

Let V C var(Q) be a set of variables, and X, Y G var(Q) a pair of variables occurring in Q, then X 
is [V]-adjacent to Y if there exists an atom A € atoms(Q) such that {X, Y} C (var(A) — V). A 
[V]-pafh 7r from X to Y consists of a sequence X = Xq, ... ,X^ = Y of variables and a sequence of 
atoms Aq, . . . , Ah^i (h > 0) such that: Xi is [V]-adjacent to Xi+i and {Xi, ^Q+i} C var(A{), for each 
i G [0.../i-l]. We denote by var(ir) (resp. atoms^)) the set of variables (atoms) occurring in the sequence 
Xq, ■ ■ ■ ,X h (A , . . . , A h _i). 

Let V C var(Q) be a set of variables occurring in a query Q. A set W C var(Q) of variables is 
[V] -connected if VX, Y G W there is a [V]-path from X to Y. A [V] -component is a maximal [V] -connected 
non-empty set of variables W C (var(Q) — V). 

Note that the variables in V do not belong to any [V]-component (i.e., V PI C = for each |V]-component 
C). 

Let C be a [1/] -component for some set of variables V. We define: 

atoms(C) := {A G atoms(Q) \ var(A) n C / 0}. 
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Note that, for any set of variables V, and for every atom A G atoms(Q) such that var(A) <2 V, there exists 
exactly one [V] -component C of Q such that A G atoms(C). 

Furthermore, let H = (T, x, A) be a hypertree of Q and V C var{Q) a set of variables. We define 
vertices(V, H) = {p G vertices(T) \ x{p) H V 7^ 0}. 

For any vertex u of T, we will often use u as a synonym of x(f). In particular, [u]- component denotes 
[x(w)]-component; the term [i>]-pafh is a synonym of [x(v )]-path; and so on. 

Definition 5.1 A hypertree decomposition HD = (T,x, A) of a conjunctive query Q is in normal form (NF) 
if for each vertex r G vertices(T), and for each child s of r, all the following conditions hold: 

1. there is (exactly) one [r]-component C r such that x(T s ) = C r U (x( s ) H x( r )); 

2. x( s ) n C r / 0, where C r is the [r]-component satisfying Point 1; 

3. var(X(s)) n C x(s). 

Note that Condition 2 above entails that, for each vertex r G vertices (T), and for each child s of r, 
2 x( r )- Indeed, C r n xi r ) = 0> an d s must contain some variable belonging to the [r]-component C r . 

Lemma 5.2 Let HD = (T, x, A) be a hypertree decomposition of a conjunctive query Q. Let r be a ver- 
tex of T, let s be a child of r, and let C be an [r]-component of Q such that C D x(T s ) 7^ 0- Then, 
vertices(C, HD) C vertices(T s ). 

Proof. For any subtree T' of T, let covered{T') denote the set {^4 € atoms{Q) \ var(A) C %( v ) f° r some 
v G vertices(T')}. 

Since C fl x(^s) 7^ 0> there exists a vertex p G vertices(T s ) which also belongs to vertices(C, HD). We 
proceed by contradiction. Assume there exists some vertex q G vertices(C, HD) such that q vertices(T s ). 
By definition of vertices(C, HD), there exists a pair of variables {X, Y} C C such that X G x(p) an d 
^ G xil)- Since X,Y <E C, there exists an [r]-pat/i 7r from X to Y consisting of a sequence of variables 
X = Xq, . . . , X{,Xi + i, . . . ,Xg = Y, and a sequence of atoms Aq, . . . , A$, Ai + ±, . . . , 

Note that F x(^s)- Indeed, Y ^ x( r )> nence an Y occurrence of F in x(^)> for some vertex u of T s , would 



violate Condition 2 of Definition [4.1[ Similarly, X only occurs as a variable in x(^s)- As a consequence, 



j4o ^ covered(T s ) (by Condition 1 of Definition 4J) and Ag-i £ covered(T s ), hence the [r]-path ir leaves 
T s , i.e., atoms(n) % covered(T s ). 

Assume w.l.o.g. that the atoms Ai, Ai + i G atoms(ir) form the "frontier" of this path w.r.t. T s , i.e., 
Ai G cover ed(T s ) and Ai + \ cover ed(T s ), and consider the variable JQ+i, which occurs in both Ai and 
Xi + \ belongs to C, hence it does not occur in x( r )» an d this immediately yields a conttadiction to 



Condition 2 of Definition 11. 1 



Lemma 5.3 Let HD = (T, x, A) be a hypertree decomposition of a conjunctive query Q and r G vertices(T). 
If V is an [r]-connected set of variables in var{Q) — x( r )> then vertices(V, HD) induces a (connected) sub- 
tree ofT. 

Proof. We use induction on |V|. 



Basis. li\V\ = 1, then V is a singleton, and the statement follows from Condition 2 of Definition |4. 1 ■ 
Induction Step. Assume the statement is established for set of variables having cardinalities c < h. Let 
V be an [r] -connected set of variables such that \V\ = h + 1, and let X G V be any variable of V such 
that V — {X} remains [r] -connected. (It is easy to see that such a variable exists.) By the induction hy- 
pothesis, verticesiy — {X},HD) induces a connected subtree of T. Moreover, {X} is a singleton, thus 
vertices({X}, HD) induces a connected subtree of T, too. 
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Since X G V, V is [r]- connected, and \V\ > 1, there exists a variable Y £ V — {X} which is 
[r]-adjacent to X. Hence, there exists an atom A G atoms(Q) such that {X, Y} C ?;ar(A). By Con- 
dition 1 of Definition |4.1[ there exists a vertex p € vertices(T) such that war (A) C x(p)- Note that 
vertices(V, HD) = vertices(V — {X},HD) U vertices({X}, HD), and p belongs to both vertices(V — 
{X}, HD) and vertices({X}, HD). Then, both sets induce connected subgraphs of T that are, moreover, 
connected to each other via the vertex p. Thus, vertices(V, HD) induces a connected subgraph of T, and 
hence a subtree, because T is a tree. I 



Theorem 5.4 For each k-width hypertree decomposition of a conjunctive query Q there exists a k-width 
hypertree decomposition ofQ in normal form. 

Proof. Let HD = (T, x, A) be any fc-width hypertree decomposition of Q. We show how to transform HD 
into a /c-width hypertree decomposition in normal form. 



Assume there exist two vertices r and s s.t. s is a child of r, and s violates any condition of Definition 5.1. 
If s satisfies Condition 1, but violates Condition 2, then x{ s ) x( r ) holds. In this case, simply eliminate 
vertex s from the tree as shown in Figure |6| It is immediate to see that this transformation is correct. 

Assume T s does not meet Condition 1 of Definition |5T], and let C\, . . . , Ch be all the [r] -components 
containing some variable occurring in x{T s ). Hence, x{T s ) Q (Ui<i<h C« U x( r ))- For each [r]-component 
C*i (1 < i < h), consider the set of vertices vertices(Ci, HD). Note that, by Lemma [5~3l , vertices(Ci, HD) 
induces a subtree of T, and by Lemma vertices(Ci, HD) C vertices(T s ), hence vertices(Ci, HD) 
induces in fact a subtree of T s . 

For each vertex v G vertices(Ci, HD) define a new vertex new(v, Cj), and let A(neti>(?;, Cj)) = A(u) and 
x(new(v,Ci)) = x(u)n(CjUx(r)). Note that x (new (u, Cj)) 7^ 0, because by definition of vertices(Ci, HD), 
x{v) contains some variable belonging to Cj. Let iVj = {neu7(u,Cj) | u G vertices(Ci, HD)}. More- 
over, for any C{ (1 < i < h), let Tj denote the (directed) graph (iVj, such that new(p, Cj) is a child of 
new(q, Cj) iff pis is a child of g in T. Tj is clearly isomorphic to the subtree of T s induced by vertices(Ci, HD), 
hence T» is a tree, as well. 

Now, transform the hypertree decomposition HD as follows. Delete every vertex in vertices(T s ) from 
T, and attach to r every tree T{ for 1 < i < h. Intuitively, we replace the subtree T s by the set of trees 
{Ti, . . . , T/j}. By construction, Tj contains a vertex new(v, Cj) for each vertex u belonging to vertices{Ci, HD) 
(1 < i < /i). Then, if we let children(r) denote the set of children of r in the new tree T obtained after 
the transformation above, it holds that for any s' G children(r), there exists an [r]-component C of Q such 
that vertices(T s i) = vertices(C, HD), and x(T s /) C (C U x( r ))- Furthermore, it is easy to verify that all 
the conditions of Definition |7T] are preserved during this transformation. As a consequence, Condition 2 of 
Definition |4~T| immediately entails that (x(T s >) n x( r )) Q xi s ')- Hence, x(T s >) = C U (x(s') n x( r ))- Tnus » 



any child of r satisfies both Condition 1 and Condition 2 of Definition qT . 

Now, assume that some vertex v G children(r) violates Condition 3 of Definition 5A. Then, add to the 
label x(v ) the set of variables var(X(v)) n x( r )- Because variables in x( r ) induce connected subtrees of T, 
and x( r ) does not contain any variable occurring in some [r] -component, this further transformation never 
invalidates any other condition. Moreover, no new vertex is labeled by a set of atoms with cardinality greater 
than k, then we get in fact a legal /c-width hypertree decomposition. 

Clearly, root(T) cannot violate any of the normal form conditions, because it has no parent in T. More- 
over, the transformations above never change the parent r of a violating vertex s. Thus, if we apply such a 
transformation to the children of root(T), and iterate the process on the new children of root(T), and so on, 
we eventually gets a new k- width hypertree decomposition (T' , A') of Q in normal form. | 

If HD = (T, x,A) is an NF hypertree decomposition of a conjunctive query Q, we can associate a set 
treecomp(s) C var(Q) to each vertex s of T as follows. 

• If s = root(T), then treecomp(s) = var(Q); 
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Figure 6: Normalizing a hypertree decomposition 



• otherwise, let r be the father of s in T; then, treecomp(s) is the (unique) [r]-component C such that 

X (T S ) = CU( X (s)n X (r)). 

Note that, since s € vertices(T s ), also x(T s ) = C U x(s) holds. 

Lemma 5.5 Le? iiD = (T, x, A) fte an NF hypertree decomposition of a conjunctive query Q, v a vertex of 
T, and W = treecomp(v) — x( v )- Then, for any [v]-component C such that (C D W) ^ 0, C QW holds. 

Therefore, the set C = {C C var(Q) \ C is a [v]-component and C C treecomp(v)} is a partition of 
treecomp{v) — x( v )- 

Proof. Let C be a [v] -component such that (C n W) ^ 0. We show that C C W . Assume this is 
not true, i.e., C — W ^ 0. By definition of treecomp(v), x(Tv) = treecomp(v) U x( v )- Hence, any 
variable Y G (C — W) only occurs in the x label of vertices not belonging to vertices(T v ). However, C 
is a [v] -component, therefore C n x(v) = 0. As a consequence, vertices(C, HD) induces a disconnected 



subgraph of T, and thus contradicts Lemma 5.3. I 



Lemma 5.6 Let HD = (T, x, A) be an NF hypertree decomposition of a conjunctive query Q, and r be a 
vertex of T. Then, C = treecomp(s) for some child s of r if and only if C is an [r]-component of Q and 
C C treecomp(r). 

Proof. (If part.) Assume C is an [r]-component of Q and C C treecomp(r). Let children(r) denote the 
set of the vertices of T which are children of r. Because C C (treecomp(r) — x( r ))> C must be included 
i n UsechUdren(r) x(T s )- Moreover, for each subtree T s of T such that s G children(r), there is a (unique) 
[r] -component treecomp(s) such that x{T s ) = treecomp(s) U (x(s) H x( r ))- Therefore, C necessarily 
coincides with one of these components, say treecomp(s) for some s £ children(r). 

(Only if part.) Assume C = treecomp(s) for some child s of r, and let C = treecomp(r). By definition 
of treecomp(s), C is an [r] -component, then (C n x( r )) = 0- Since iTO is in normal form, x(T s ) = 
CL)(x(s)r\x{r)) andx(T r ) = (C'U X (r)). Moreover, s is a child of r, then vertices (T s ) C vertices(T r ) and 
thus x(T s ) C x(r r ). Therefore, CU(x(s)nx(r)) C x( r r), and hence we immediately get C C (C"Ux(r)). 
However, (C n x(r)) = and thus C C C . I 



Lemma 5.7 For any NF hypertree decomposition HD = (T,Xi ^) of a query Q, \ vertices (T)| < |uar(Q)| 
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Proof. Follows from Lemma |5.6[ , Lemma |5.5[ , and Condition 2 of the normal form, which states that, for any 

v £ vertices(T), x(v) H treecomp(v) ^ 0. Hence, treecomp(v) — x( v ) C treecomp(v) and thus, for any 
child s of u in T, treecomp(s) is actually a proper subset of treecomp(v). | 



Lemma 5.8 L^f ffl) = (T, x, A) &e arc iVF hypertree decomposition of a query Q, s a vertex of T, and 
C a set of variables such that C C treecomp(s). Then, C is an [s] -component if and only if C is a 
[var(X(s))]-component. 



Proof. Let V = var(X(s)). By Condition 4 of Definition |4TT|, (V n x( T s)) Q x( s )- Since HD is in normal 
form, V satisfies the following property. 

(1) (V n treecomp(s)) C %( s )- 

(On/j if part.) Assume C C treecomp(s) is an [s] -component. From Property 1 above, C n V = holds. 
As a consequence, for any pair of variables {X, Y} C(7,I [s]-adjacent to Y entails X [V]-adjacent to Y. 
Hence, C is a [V]-connected set of variables. Moreover, x( s ) Q V ' . Then, any [V]-connected set which is a 
maximal [s]-connected set is a maximal [F]-connected set as well, and thus C is a [V] -component. 

(If part.) Assume C C treecomp(s) is a [V] -component. Since x(s) C V, C is clearly [s] -connected. 



Thus, C C C", where C is an [s]-component and, by Lemma pT5|, C' C (treecomp(s) — x( s )) holds. By the 
"only if" part of this lemma, C is a [V] -component, therefore C cannot be a proper subset of C, and C = C 
actually holds. Thus, C is an [s] -component. I 



5.2 A LOGCFL Algorithm Deciding fc-bounded Hypertree-Width 

Figure [7] shows the algorithm k-decomp, deciding whether a given conjunctive query Q has a k -bounded 
hypertree-width decomposition. In that figure, we give a high level description of an alternating algorithm, to 
be run on an alternating Turing machine (ATM). The details of how the algorithm can be effectively imple- 



mented on a logspace ATM will be given later (see Lemma p. 14h . 

To each computation tree r of k-decomp on input query Q, we associate a hypertree S(t) = (T, x, A), 
called the witness tree of r, defined as follows: For any existential configuration of r corresponding to the 
"guess" of some set S C atoms(Q) during the computation of k-decomposable(C, R), for some [v ar(i?)]-component 
C, (i.e., to Step 1 of /c-decomp), T contains a vertex s. In particular, the vertex so guessed at the initial call 
k-decomposable(var(Q),%), is the root of T. 

There is an edge between vertices r and s of T, where s ^ sq, if S is guessed at Step 1 during the 
computation of k-decomposable(C, R), for some [var (i?)]-component C (S and R are the (guessed) sets of 
atoms of r corresponding to s and r in T, respectively). We will denote C by comp(s), and r by father(s). 
Moreover, for the root so of T, we define comp(so) = var(Q). 

The vertices of T are labeled as follows. A(s) = S (i.e., A(s) is the guessed set 5 of atoms corresponding 
to s), for any vertex s of T. If so = root(T), let xi s o) = var(X(so)); for any other vertex s, let x{s) = 
var(X(s)) PI (x(r) U C), where r = father (s) and C = comp(s). 

Lemma 5.9 For any given query Q such that hw(Q) < k, /c-decomp accepts Q. Moreover, for any c < k, 
each c-width hypertree-decomposition of Q in normal form is equal to some witness tree for Q. 

Proof. Let HD = (T, x, A) be a c-width NF hypertree decomposition of a conjunctive query Q, where c < k. 
We show that there exists an accepting computation tree r for /c-decomp on input query Q such that S(t) = 
(T'jx'jA') "coincides" with HD. Formally, there exists a bijection / : vertices(T) — ► vertices(T') such 
that, for any pair of vertices p, q G T, p is a child of q in T iff f(p) is a child of f(q) in T', X(p) = X'(f(p)), 

Hq) = X(f(q)lx(p) = x'(f(p)),™d x (q) = x >(f( q )). 

To this aim, we impose to A;-decomp on input Q the following choices of sets S in Step 1: 
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ALTERNATING ALGORITHM fc-decomp 
Input: A non-empty Query Q. 

Result: "Accept", if Q has A; -bounded hypertree width; "Reject", otherwise. 

Procedure k-decomposable{Cu: SetOJVariables, R: SetOfAtoms) 
begin 

1) Guess a set S C atoms(Q) of k elements at most; 

2) Check that all the following conditions hold: 

2.a) VP € atoms(C R ), (var(P) n var(R)) C uar(5) and 
2.b) var{S)r\C R ^0 

3) If the check above fails Then Halt and Reject; Else 

Let C := {C C var(Q) \ C is a [var(S)]-component and C C (7r}; 

4) If, for each C E C, k-decomposable(C, S) 

Then Accept 
Else Reject 

end; 

begin(* MAIN *) 

Accept if k-decomposable{var{Q) , 0) 
end. 



Figure 7: A non-deterministic algorithm deciding k-bounded hypertree-width 

a) For the initial call k-decomposable(var(Q), 0), the set S chosen in Step 1 is X(root(T)). 

b) Otherwise, for a call k-decomposable(Cn, R), if R is the label A(r) of some vertex r, and if r has a 
child s such that treecomp(s) = Cr, then choose 5 = A(s) in Step 1. 

We use structural induction on trees to prove that, for any vertex r £ vertex(T), if we denote /(r) by r', the 
following equivalences hold: A(r) = A'(r'); treecomp(r) = comp(r'); and %(r) = x'( r ')- 

Basis: For r = root(T), we set f(root(T)) := root(T'). Thus, by choosing A'(/(r)) = A(r) as described 
at Point a) above, all the equivalences trivially hold. 

Induction Step: Assume that the equivalence holds for some vertex r G vertices(T). Then, we will 
show that the statement also holds for every child of r. Let r' € vertices(T') denote f(r), and let s be 



any child of r in T. By Lemma p^q and Lemma 5.8, the [r]-component treecomp(s) coincides with some 



[i)ar(A'(r'))]-component comp(s') corresponding to the call k-decomposable(comp(s'), A'(r')) that gener- 
ated a child s' of r', which we define to be the image of s, i.e., we set f(s) := s'. Since HD is a fc-width 
hypertree decomposition, and the induction hypothesis holds, it easily follows that, by choosing A(s) = X'(s') 
as prescribed at Point b) above, no check performed in Step 2 of the call k-decomposable(comp(s' ) , A'(r')) 
can fail. 

Next we show that %(s) = x'( s ')- Let C = comp(s') = treecomp(s), and V = var(X(s)) = var(X'(s')). 
By Condition 4 of Definition fOj V n x(T s ) C ^(s) holds. Since is in normal form, we can replace 



x(T s ) by C U x( s ) according to Condition 1 of Definition p7T|, and we get V D (C U x( s )) Q x( s )- Hence, 
we obtain the following property 

(i) vnc c x (s). 

Now, consider x'i 5 ')- By definition of witness tree, x'( s ') = V H (x'(r') U C) = Vfl (x( r ) U C)« By 
Property (1) above, V H C C x( s )- Moreover, iJD is in NF, and Condition 3 of the normal form entails 
that (V fl x( r )) Q x( s )- As a consequence, x'i s> ) ^ x( s )- We claim that this inclusionship cannot be proper. 
Indeed, by definition of x'( s ')' if x'( s ') C x( s )» there exists a variable Y G x( s ) which belongs neither 
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to x( r )> nor to C- However, this entails that Y belongs to some other [r]-component and thus s violates 
Condition 1 of the normal form. 

In summary, /c-decomp accepts Q with the accepting computation tree r determined by the choices de- 
scribed above, and its witness tree 5(t) is a c- width hypertree decomposition of Q in normal form. | 

Lemma 5.10 Assume that fc-decomp accepts an input query Q with an accepting computation tree r and 
let S(t) = (T, x, A) be the corresponding witness tree. Then, for any vertex s ofT: 

a) if s ^ root(T), then comp(s) is a [ f ather(s) ]-component; 

b) for any C C comp(s), C is an [s]-component if and only if C is a [var(\(s))] -component. 

Proof. We use structural induction on the tree T. 

Basis: Both parts of the lemma trivially holds if s is the root of T. In fact, in this case, we have %(s) = 
var(\(s)), by definition of witness tree. 

Induction Step: Assume that the lemma holds for some vertex r G vertices(T). Then, we will show 
that both parts hold for every child of r. The induction hypothesis states that any [var(A(r))]-component 
included in comp(r) is an [r]-component included in comp(r), and vice versa. Moreover, if r / root(T), 
then comp(r) is a [/at/ier(r)]-component; otherwise, i.e., r is the root, comp(r) = var(Q), by definition 
of witness tree. Let s G vertices(T) be a child of r and let V = var(X(s)). We first observe that, by 
definition of the variable labeling \ of the witness tree, it follows that var(\(s)) n comp(s) C x(s). Hence, 
the following holds. 
Fact 1 : (V — x( s )) n comp(s) = 0. 

(Point a.) Immediately follows by the definition of comp(s) and by the induction hypothesis. Indeed, r 
is the father of s and by the induction hypothesis any [var (A(r))]-component included in comp(r) is an 
[r]-component included in comp(r). Thus, in particular, comp(s) is an [r] -component. 
(Only if part, Point b.) Assume that a set of variables C C comp(s) is an [s]-component. By Fact 1, Cfl (V — 
x(s)) = holds, and for any pair of variables {X, Y} C C, X [s]-adjacent to Y entails X [V]-adjacent to 
Y. Hence, C is a [V] -connected set of variables. Moreover, x( s ) ^ V. Then, any [F]-connected set which 
is a maximal [s] -connected set is a maximal [V] -connected set as well, and thus C is a [V] -component. 
(If part, Point b.) We proceed by contradiction. Assume C C comp(s) is a [V] -component, but C is not 
an [s] -component, i.e., C is not a maximal [s] -connected set of variables. Since %(s) C V, C is clearly 
[s] -connected, then it is not maximal. That is, there exists a pair of variables X G C and Y C such that 
X is [s]-adjacent to Y, but X is not [var(A(s))]-adjacent to Y. Let A be any atom proving their adjacency 
w.r.t. s, i.e., {X, Y} C var(A) — x(s). Hence, because X G C and X is not [V]-adjacent to Y, it follows 
that Y G (V — x( s ))- By F act L (V — xi s )) n comp(s) = 0, therefore Y comp(s). In summary, 
X G comp(s) and Y comp(s). Moreover, comp(s) C comp(r), by Step 4 of A;-decomp. Hence, by 
induction hypothesis, comp(s) is an [r]-component and thus X is not [r]-adjacent to Y. Consider again the 
atom A. We get {X, Y} % var(A) — x{ r )- Since X G comp(s), the variable Y must belong to x( r )- 
However, by definition of witness tree, Y G x{ r ) an d Y G var(X(s)) entail that 1" G x( s )> which is a 
contradiction. | 

Lemma 5.11 Assume that /c-decomp accepts an input query Q with an accepting computation tree r. Let 
S(t) = (T, x, A) be the corresponding witness tree, and s G vertices(T). Then, for each vertex v G T s : 

x(v) C comp(s) U x(s) 
comp(v) C comp(s) 

Proof. We use induction on the distance d(v, s) between any vertex v G vertices(T s ) and s. The basis is 
trivial, since d(v, s) = means v = s. 

Induction Step. Assume both statements hold for distance n. Let v G vertices(T s ) be a vertex such that 
dist(v, s) = n + 1. Let t/ be the father of u in T s . Clearly, dist(v', s) = n, thus 
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(a) x( v ') £j (comp(s) U x( s )); an d 
(6) comp(v') C comp(s). 

v is generated by some call k-decomposable(comp(v) , A(t/)). By the choice of u and the definition of witness 
tree, it must hold (a') x(^) Q (comp(v)Ux(v')), and by Step 4 of the call k-decomposable(comp(v'), X{f other {v'))) 
we get (b 1 ) comp{v) C comp(v'). By (a') and (&')» we obtain (a") (comp(u') U x( -u '))- By ( a ")> (&)> 

and (a) we get C (comp(s) U x( s ))- Moreover, (6) and (6') yield comp(v) C comp(s). | 



Lemma 5.12 If k-decomp accepts an input query Q, then hw(Q) < fc. Moreover, each witness tree for Q 
is a c-width hypertree-decomposition of Q in normal form, where c < k. 

Proof. Assume that r is an accepting computation tree of /c-decomp on input query Q. We show that 
6(t) = (T, x, A) is an NF c-width hypertree decomposition of Q, for some c < k. 

First, we will prove that <5(r) fulfils all the properties of Definition iA and is thus a hypertree decomposition 
of Q. 

Property 1 : MA G atoms(Q) 3v G vertices(T) s.t. var(A) C x(^)- 
We first prove the following claim. 

CLAIM A: Let s be any vertex of T, and let C r = comp(s). Then, for each P € atoms(C r ) it holds that 

(a) V^4 G (atoms(Q) — atoms(C r )), (var(P) Pivar(A)) C an d 

(b) either var(P) C x(s) or there exists an [s] -component C s C C r such that P € atoms(C s ). 



Proof of Claim A. (Par? a). We use structural induction on the tree T. 

Basis: Part (a) of the claim trivially holds if s is the root of T. In fact, in this case, we have 
C r = comp(s) = var(Q) and hence atoms(C r ) = atoms(Q). 

Induction Step: Assume that Part (a) holds for some vertex s G vertices(T). Then, we will show 
that the statement also holds for every child of s. Let C r = comp(s) and V = var(X(s)). The 
induction hypothesis states that VP G atoms(C r ) and MA G" atoms(C r ), var(P) n var(A) C 
x(s). By Step 4 of /c-decomp, for each [V] -component C s.t. C C C r , T contains a vertex g 



such that comp(q) = C and father (q) = s. Moreover, by Lemma p.lO| , C is an [s] -component. 
Let C s be an [s] -component s.t. C s C C r , and let P' belong to atoms(C s ). By choice of 
C s C r , P' also belongs to atoms(C r ). First note that, MA $ atoms(C s ), we have 

(1) var(P') n uar(A) C x (s). 

Indeed, if var(A) C x(s) (1) is trivial, and if A ^ atoms(C r ), it follows from the induction 
hypothesis. Otherwise, i.e. if A contains some variable belonging to another [s]-component 
included in C r , it immediately follows by definition of [s] -component. Now, for the compo- 
nent C s , Step 1 of k-decomposable(C s , A(s)) guesses a vertex s' such that, VP G atoms(C s ), 
(var(B) n var(X(s))) C i>ar(A(s')). In particular, (var(P') f\var(X(s))) C uar(A(s')). Be- 
cause C uar(A(s)), this yields (var(P') n x( s )) != (far(A(s')) n x( s ))- By definition of 
witness tree, (var(X(s')) PI x( s )) x( s ')> hence we get (var(P') n x( s )) != x( s 0- By combin- 
ing this result with relationship (1) above, we get that MA G" atoms(C s ) (var(P') n far (A)) C 
(var(P') fl x( s )) S xi s ')- Hence, Part (a) of the claim holds even for s' and thus for every child 
of s in T. 

(Part b). Let s be any vertex of T, let C r = comp(s), and let P belong to atoms(C r ). Assume 
that var(P) % x(s) and that P G atoms(C' s ), where C' s is an [s] -component not included in C r , 
i.e., ^ C r . Then, there exists a variable 1" G Cg s.t. Y ^ C r and there is an [s]-path 7r from 
y to any variable X G (var(P) — x( s ))- 
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Let A be an atom belonging to both atoms(C r ) and atoms(C' s ). Then, var(A) — x( r ) Q C r 
and var(A) — x(s) C C s hold. As a consequence, (var(A) R C^) C C r . Indeed, if this is not 
true, there exists a variable Z G (war(A) — x( s )) sucn that Z G x( r )- By the definition of the x 
labeling of a witness tree, this entails that Z G (var(A) R var(X(r))), but Z g" uar(A(s)), which 
contradicts the fact that A satisfies the condition checked at Step 2. a of fc-decomp, because r is 
an accepting computation tree. 

Therefore, there exist two atoms {Q',P'} C atoms(iT) belonging to atoms(C' s ) and adjacent 
in 7r s.t. Q' $ atoms(C r ), P' G atoms(C r ), and var(Q') R var(P') (£. x(s). However, this 
contradicts Part (a) of the claim, o 



Note that, by Lemma 5.10, in the Step 4 of any call k-decomposable(C, A(r)) of an accepting computation 
of /c-decomp, [s] -components included in C and [t> ar(A(s))]-components included in C coincide. Thus, 
Property 1 follows by inductive application of Part b of Claim A. In fact, Part (b) of the claim applied to 
the root so of T, states that, \/A G atoms(Q), either var(A) C x( s o)> or A G atoms(Cs) for some 
[5] -component C5 of Q that will be further treated in Step 4 of the algorithm. Thus, var(A) is covered 
eventually by some chosen set of atoms S, i.e., there exists some vertex s of T, such that A(s) = S, and 
■uar(A) C x( s )- 

Property 2 : eac/i variable Y G var(Q) the set {v G vertices{T) \ Y G x( u )} induces a connected 
subtree ofT. 

Assume that Property 2 does not hold. Then, there exists a variable Y G var(Q) and two vertices v\ 
and of T such that F G (xi v i) H x( w 2)) but the unique path from wi to ^2 in T contains a vertex w 
such that y x( u ')- W.l.o.g, assume that v\ is adjacent to u; and that v% is a descendant of w in T, i.e., 
V2 G vertices(T w ). There are two possibilities to consider: 

• v\ is a child of io and t>2 belongs to the subtree T p of another child p of to. However, this would mean 
that, by Step 4 of /c-decomp and by Lemma p. 1 1| , the variables in sets V\ = (x( v i ) ~ x( w )) an d V2 = 
(x( v 2) — x( w )) belong to distinct [w] -components. But this is not possible, because Y G (Vj n V2). 

• w is a child of v\ and V2 belongs to the subtree T w of T rooted at w. Then, X(w) was chosen as set S 
in Step 1 of /c-decomposable(C, X(vi )), where C is a [f i]-component. Note that Y G x( u i) entails 
y C, by definition of [v i]-component. Since V2 belongs to the subtree T w , by Lemma |5.11| it holds 
that x{ v 2) Q (C U x( w ))- This is a contradiction, because Y G x{ v 2), but Y belongs neither to x{ w )> 
nor to C. 

Property 3 : \/p G vertices(T), x{p) Q var(X(p)). 
Follows by definition of the x labeling of a witness tree. 
Property 4 :\/p G vertices(T), var(X(p)) R x(T p ) C x(p)- 



Let v be any vertex in T p , and let V = var(X(p)). By Lemma 5.11, x(^) Q comp(p) U x(p)- Hence, 
V R x( w ) ^ V R comp(p) U x(p)> because Property 3 holds for p. However, by definition of witness tree, 
(V R comp(p)) C x(p)> an d tnus n xi v )) Q x(p)- 

Thus, (5(r) is a hypertree decomposition of Q. Let c be the width of S(t). Since Step 1 of fc-decomp only 
chooses set of atoms having cardinality bounded by k, c < k holds. 

Moreover, S(r) is in normal form. Indeed, Condition 2 and Condition 3 of Definition ^j] hold by Step 2.b 
of fc-decomp, and by definition of the x labeling of a witness tree. Finally, since 6(t) is a hypertree de- 



composition, by Lemma 5^, Lemma |5.11[ , and the definition of the x labeling of a witness tree, we get that 



Condition 1 holds for 5(t), too. | 



By combining Lemma 5.9 and Lemma 5.12 we get 



Theorem 5.13 fc-decomp accepts an input query Q if and only ifhw(Q) < k. 
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Lemma 5.14 /c-decomp can be implemented on a logspace ATM having polynomially bounded tree-size. 

Proof. Let us refer to logspace ATMs with polynomially bounded tree-size as LOGCFL-ATMs. We will 
outline how the algorithm A;-decomp can be implemented on an LOGCFL-ATM M. 

We first describe the data-structures used by M. Instead of manipulating atoms directly, indices of atoms 
will be used in order to meet the logarithmic space bound. Thus the i-th atom occurring in the given repre- 
sentation of the input query Q will be represented by integer i. Sets of at most k atoms, A;-sets for short, are 
represented by fc-tuples of integers; since k is fixed, representing such sets requires logarithmic space only. 
Variables are represented as integers, too. 

If R is a fc-set, then a [uar(i?)]-component C is represented by a pair (rep(R), first(C), where rep(R) 
is the representation of the fc-set R, and first(C) is the smallest integer representing a variable of the com- 
ponent C. For example, the 0-component var(Q) is represented by the pair (rep(0), 1). It is thus clear that 
[var(i?)]-components can be represented in logarithmic space, too. 

The main data structures carried with each configuration of M consist of (the representations of): 

• a fc-set 

• a [var(i?)]-component Cr, 

• a A;-set S, and 

• a [var(S')]-component C. 

Not all these items will contain useful data in all configurations. We do not describe further auxiliary logspace 
data structures that may be used for control tasks and for other tasks such as counting or for performing some 
of the SL subtasks described below. 

We are now ready to give a description of the computation M performs on an input query Q. 

To facilitate the description, we will specify some subtasks of the computation, that are themselves solvable 
in LOGCFL, as macro-steps without describing their corresponding computation (sub-)trees. We may imagine 
a macro-step as a special kind of configuration - termed oracle configuration - that acts as an oracle for the 
subtask to be solved. 

Each oracle configuration can be normal or converse. 

A normal oracle configuration has the following effect. If the subtask is negative, this configuration has no 
children and amounts to a REJECT. Otherwise, its value (ACCEPT or REJECT) is identical to the value of its 
unique successor configuration. 

A converse oracle configuration has the following effect. If the subtask is negative, this configuration has 
no children and amounts to an ACCEPT. Otherwise, its value (ACCEPT or REJECT) is identical to the value 
of its unique successor configuration. 

From the definition of logspace ATM with polynomial tree-size, it follows that any polynomially tree-sized 
logspace ATM M with LOGCFL oracle configurations (where an oracle configuration contributes 1 to the 
size of an accepting subtree) is equivalent to a standard logspace ATM having polynomial tree size. 

M is started with R initialized to the empty set and Cr having value var(Q). 

We describe the evolution of M corresponding to a procedure call k-decomposable(C 'r, R). 

Instruction 1 is performed by guessing an arbitrary k-set S of atoms. 

The "Guess" phase of Instruction 1 is implemented by an existential configuration of the ATM. (Actually, 
it is implemented by a subtree of existential configurations, given that a single existential configuration can 
only guess one bit; note however, that each accepting computation tree will contain only one branch of this 
subtree.) 

4 The separate representation of R is actually slightly redundant, given that R also occurs in the description of the [R] -component 

Cr. 
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Checking Step 2 is in symmetric logspace (SL). The most difficult task is to enumerate atoms of atoms(Cn), 
which in turn - as most substantial subtask - requires to enumerate the variables of Cr. Remember that Cr 
is given in the form (rep(R),i) as described above. Thus, enumerating Cr amounts to cycling over all vari- 
ables j and checking whether j is [R] -connected to i. The latter subtask is easily seen to be in SL because it 
essentially amounts to a connectedness-test of two vertices in an undirected graph. It follows that the entire 
checking-task of Instruction 2 is in SL. Since SL C LOGCFL, this correspond to a LOGCFL-subtask. We 
can thus assume that the checking-task is performed by some normal oracle configuration. If the oracle com- 
putation fails at some branch, the branch ends in a REJECT, otherwise, the guessed fc-set S corresponding to 
that branch satisfies all the conditions checked by Step 2 of /c-decomp. 

Steps 3 and 4 together intuitively correspond to a "big" universal configuration that universally quantifies 
over all subtrees corresponding to the procedure calls k-decomposable(C, S) for all C € C. This could be 
realized as follows. First, a subtree of universal configurations enumerates all candidates Q = (rep(S),i) for 
1 < i < \var(Q)\, for [uar(5)]-components. Each branch of this subtree (of polynomial depth) computes 
exactly one candidate Cj. Each such branch is expanded by a converse oracle configuration checking whether 
Ci is effectively a [var(S')]-component contained in Cr. Thus, branches that do not correspond to such a 
component are terminated with an ACCEPT configuration (they are of no interest), while all other branches 
are further expanded. Each branch Ci of the latter type is expanded by the subtree corresponding to the 
recursive call k-decomposable(Ci , S). 

We have thus completely described a logspace ATM M with oracle configurations that implements /c-decomp. 
It is easy to see that this machine has polynomial accepting computation trees. In fact, this is seen from the 
fact that there exist only a polynomial number of choices for set S in Step 1, and that no such set is chosen 
twice in any accepting computation tree. I 



From the lemmas above and Proposition 2.3, we obtain the following theorem 



Theorem 5.15 Deciding whether a conjunctive query Q has k-bounded hypertree-width is in LOGCFL. 

In fact, the following proposition states that an accepting computation tree of a bounded-treesize logspace 
ATM can be computed in (the functional version of) LOGCFL. 



Proposition 5.16 ([|14fl) Let M be a bounded-treesize logspace ATM recognizing a language A. It is possible 
to construct a L LOGCFL transducer T which for each input w € A outputs a single (polynomially- sized) 
accepting tree for M and w. 



By Lemma and Lemma 5.12j , we have a one-to-one correspondence between the NF k-width hypertree 



decompositions and accepting computation trees of /c-decomp. Thus, by Proposition p.lq , we get that 
hypertree decompositions are efficiently computable. 

Theorem 5.17 Computing a k-bounded hypertree decomposition (if any) of a conjunctive query Q is in 
l logcfl i e > in f unctiona i LOGCFL. 

Since LOGCFL is closed under L LOGCFL reductions [|l4]], the two following statements follow from the 



theorem above and Theorem 4.6 and Theorem nJ\, respectively. 



Corollary 5.18 Deciding whether a k-bounded hypertree-width query Q evaluates to true on a database DB 
is LOGCFL-complete. 



Corollary 5.19 The answer of a (non-Boolean) k-bounded hypertree-width query Q can be computed in time 
polynomial in the combined size of the input instance and of the output relation. 
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5.3 A Datalog Program Recognizing Queries of /c-bounded Hypertree- Width 

In this section we show a straightforward polynomial-time implementation of the LOGCFL algorithm above. 
In particular, we reduce (in polynomial time) the problem of deciding whether there exists a A; -bounded 
hypertree-width decomposition of a given conjunctive query Q to the problem of solving a Datalog program. 

First, we associate an identifier (e.g., some constant number) to each /c-vertex (non empty subset of Q 
consisting of k atoms at most) R for Q, and to each [R] -component C for any fc-vertex R of Q. Moreover, 
we have a new identifier root which intuitively will be the root of any possible tree-decomposition, and a 
new identifier varQ which encodes the set of all the variables of the query and hence is seen as a component 
including any subset of var(Q). 

Then, we compute the following relations:^] 

• k-vertex( ■ ): Contains a tuple (R) for each k-vertex R of Q. 

• component "■, ■): Contains a tuple (Cr, R) for each [R] -component Cr of some k- vertex R. 
Moreover, it contains the tuple (varQ, root). 

• meets-condition( ■ , •, •): Contains any tuple (S, R, Cr) s.t. S and R are fc-vertices, Cris an [R] -component, 
and the following conditions hold: var(S) n Cr ^ 0, and VP € atoms{Cn) var(P) n var(R) C 
var(S). 

Moreover, it contains a tuple (5, root, varQ) for any fc-vertex S. 

• subset(-, ■): Encodes the standard set-inclusion relationship between [R] -components of Q. 
Let V be the following Datalog program: 

1. k-decomposable( R, Cr)*— k-vertex(S), meets-conditions( S ', R, Cr), -> undecomposable(S , Cr) 

2. undecomposable(S, Cr)*— component(C s , S), subset(Cs, Cr), -i k-decomposable(S, Cs)- 
It is easy to see that hw(Q) < k if and only if V \= k-decomposable(root, varQ). 

Note that V is locally stratified on the base relations to which it is applied, and it is clearly evaluable in 
polynomial time. 



6 Bounded Hypertree-Width vs Related Notions 

Many relevant cyclic queries are - in a precise sense - close to acyclic queries because they can be decomposed 
via low bandwidth decompositions to acyclic queries. 

The main classes of bounded-width queries considered in database theory and in artificial intelligence are 
the following: 

• Queries of bounded treewidth. Treewidth is the best-known graph theoretic measure of tree-similarity. 
The concept of treewidth is based on the notion of tree-decomposition of a graph. The concept of 
treewidth is easily generalized to hypergraphs and thus to conjunctive queries. Conjunctive queries of 
bounded treewidth can be answered in polynomial time wtt. For each fixed k, deciding whether a query 
has treewidth k is in LOGCFL. 



Queries of bounded degree of cyclicity. This concept was introduced by Gyssens et al. Q18| , |17[ ] and is 
based on the notion of hinge-tree decomposition. The smaller the degree of cyclicity of a hypergraph, 
the more the hypergraph resembles an acyclic hypergraph. Hypergraphs of bounded treewidth have 
also bounded degree of cyclicity, but not vice-versa. Queries of bounded degree of cyclicity can be 



recognized and processed in polynomial time Q17|]. 



5 For the sake of clarity, we directly refer to objects by means of their associated identifiers (which we also use as logical terms in 
the program). 
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• Queries of bounded query-width. This notion is based on the concept of query decomposition [Eh. 
Any hinge-tree decomposition of width k is also a query decomposition of width k, but not vice-versa. 
Thus query decompositions are the most general (i.e., most liberal) decompositions. It follows from 
results in [Q] that queries of bounded query-width can be answered in polynomial time, once a query 
decomposition is given. 

Thus, the class of queries of bounded query-width is the widest of the above mentioned classes of tractable 
cyclic queries. We next show that this class is properly included in the class of queries of bounded hypertree- 
width. More precisely, we show that every fc-width query-decomposition corresponds to an equivalent fc-width 
hypertree-decomposition, but the converse is not true, in general. Recall that hw(Q) and qw{Q) denote the 
hypertree width and the query width of a conjunctive query Q. 

Theorem 6.1 

a) For each conjunctive query Q it holds that hw{Q) < qw(Q). 

b) There exist queries Q such that hw{Q) < qw(Q). 

Proof. (Point a.) Let Q be a conjunctive query and (T, A) a query decomposition of Q. W.l.o.g., assume 
Q is pure (i.e., labels contain only atoms, see Section [OJ). Then, (T, x, A) is a hypertree decomposition of 
Q, where, for any vertex v of T, xi v ) consists of the set of variables var(\(v)) occurring in the atoms X(v). 
Indeed, because the properties of query decompositions holds for (T, A), (T, x, A) verifies Condition 1 and 2 
of Definition fO| . Condition 3 and 4 follows immediately, as x(p) = var(X(p)) by construction. Therefore, 
hw{Q) < qw{Q). 

(Point b.) The query Q± of Example |^2]has no query decompositions of width 2, and it is easy to see that 
qw(Qi) = 3. However, hw{Q^) = 2, as witnessed by the hypertree decomposition shown in Figure |5| | 
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